Methods for using a speech to obtain additional information

ABSTRACT

An item of information ( 212 ) is transmitted to a distal computer ( 220 ), translated to a different sense modality and/or language ( 222 ), and in substantially real time, and the translation ( 222 ) is transmitted back to the location ( 211 ) from which the item was sent. The device sending the item is preferably a wireless device, and more preferably a cellular or other telephone ( 210 ). The device receiving the translation is also preferably a wireless device, and more preferably a cellular or other telephone, and may advantageously be the same device as the sending device. The item of information ( 212 ) preferably comprises a sentence of human of speech having at least ten words, and the translation is a written expression of the sentence. All of the steps of transmitting the item of information, executing the program code, and transmitting the translated information preferably occurs in less than 60 seconds of elapsed time.

PRIORITY

This application is a Continuation under 35 U.S.C. §120 of U.S. patentapplication Ser. No. 13/572,224, filed Aug. 10, 2012, which is adivisional of U.S. patent application Ser. No. 13/426,074, filed Mar.21, 2012, now U.S. Pat. No. 9,400,785, which is a divisional of U.S.patent application Ser. No. 10/466,202 filed Sep. 6, 2006, now U.S. Pat.No. 8,165,867, which is U.S. National Phase of PCT/US00/25613 filed Sep.15, 2000 all of which are incorporated herein by reference in theirentirety.

FIELD OF THE INVENTION

The field of the is remote computing.

BACKGROUND OF THE INVENTION

As processing speeds continue to improve and data storage becomes everless expensive, many sophisticated applications that were previouslyonly available on mainframe or desktop computers have been ported tolaptop computers and other portable electronic equipment. Manyapplications have even been ported to hand held electronic devices aswell, including hand held computers, digital telephones, personaldigital assistants (PDAs), and so forth. For example, personal databaseswith limited search capabilities are now included in cellular phones,and word processing can now be performed in PDAs.

There are, however, several applications that are presently difficult orimpossible to realize on hand-held electronic devices, and are onlypoorly realized even on larger systems such as desktop computers. Due tothe large volumes of data involved, and the need to process at very highspeeds, a particularly difficult application is voice recognition. Someattempts have been made in that direction, but all of them suffer fromone or more disadvantages.

At the low end, limited word or phrase recognition capabilities aresometimes provided in cell phones. Such systems can usually recognizeonly a few words (e.g., the numerals 0-9, and specialized key words suchas a person's name, or the commands “dial” or “open filepatentapp.doc”). Such systems are particularly advantageous where onlyrudimentary recognition capabilities are needed, or where only verylimited data storage capability or computing power is available.However, an obvious shortcoming of the word or phrase recognitionsystems is that the usability is limited to a small, preprogrammedvocabulary, and at most a few custom words. Moreover, word or phraserecognition systems often fail to recognize personal speech pattern oraccents.

At the higher end, speech recognition programs are currently availablefor operation on laptop computers. As used herein both “speechrecognition” and “word or phrase recognition” are considered to becategories of voice recognition, “Speech recognition”, however, islimited to systems having a vocabulary of at least 200 words, and whereindividual words are interpreted in the context of surrounding words.For example, speech recognition would correctly interpret phrases suchas “I have been to the beach” whereas a word or phrase recognitionsystem may substitute “bean” for “been”.

As with other computer software application, most of the developmenteffort is being directed towards porting the more sophisticated speechrecognition to smaller and smaller devices. It may well be that within adecade the goal of true speech recognition will be available on evenhand-held electronic devices.

What is not presently appreciated, however, is that porting ofsophisticated software to portable electronic devices may not bedesirable. Cell phones, for example, .need only relatively rudimentaryelectronics to support the required communications, and placingsophisticated storage and processing in cell phones may be a waste ofmoney. Moreover, no matter how sophisticated the software and hardwarebecomes in hand held and other portable devices, there will always be aperceived need for additional capabilities. Larger or specializedvocabularies may be desired, as well as recognition capabilities fordifferent accents and languages, and perhaps even language translationcapabilities. Still further, it is impractical to install voicerecognition in all the myriad types of devices that may advantageouslyutilize voice recognition. For example, voice recognition may be usefulin VCR and CD players, kitchen and other household appliances such astoasters and washing machines, automobiles and so forth.

Thus, while it has been known to translate information in a first sensemodality and language into a second sense modality and language on asingle local computer, it has not been appreciated to perform thetranslation in a “remote computing” manner, thereby concentrating thecomputing power in a cost effective manner. Consequently, there is aneed to provide voice recognition capabilities, and especially speechrecognition capabilities, to myriad electronic devices without actuallyinstalling all of the required hardware and software in all suchdevices.

SUMMARY OF THE INVENTION

The present invention provides systems and methods in which an item ofinformation is transmitted to a distal computer, translated to adifferent sense modality and/or language, and in substantially realtime, and the translation is transmitted back to the location from whichthe item was sent.

The device sending the item is preferably a wireless device, and morepreferably a cellular or other telephone. The device receiving thetranslation is also preferably a wireless device, and more preferably acellular or other telephone, and may advantageously be the same deviceas the sending device. The item of information preferably comprises asentence of human speech having at least ten words, and the translationis a written expression of the sentence. All of the steps oftransmitting the item of information, executing the program code, andtransmitting the translated information preferably occurs in less than60 seconds of elapsed time, and more preferably less than 30 seconds.

Various objects, features, aspects and advantages of the presentinvention will become more apparent from the following detaileddescription of preferred embodiments of the invention, along with theaccompanying drawing.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an exemplary schematic of a method of changing the sensemodality of an information according to the inventive subject matter.

FIG. 2 is an exemplary embodiment of a method of changing the sensemodality of an information according to the inventive subject matter.

DETAILED DESCRIPTION

As used herein, the term “sense modality” refers to the manner in whichinformation is perceived by a human being. There are five sensemodalities comprising sight, sound, taste, smell, and touch. Obviously,different aspects of information may be expressed in multiple sensemodalities at the same time. A conversation between two people, forexample, may be perceived as both sound (spoken language) and sight(hand gestures). Similarly, music can be perceived as both sound(auditorily (perceived vibration) and touch (tactually perceivedvibration).

Information in each of the five sense modalities can be expressed innumerous languages, with the term “language” being interpreted verybroadly. Information expressed in the sight modality, for example, canbe expressed in various text languages as well as various graphicslanguages. Exemplary text languages include the various character setsof human languages (Roman, Cyrillic, Chinese, etc), as well as computerlanguages (ASCII, HTTP, XML, Basic, Cobol, Pascal, C++, etc). Graphics“languages” include moving images, still images, painting, and so forth.

Even within a given language there are different styles, which are alsoreferred to herein from time to time as styles. Character fonts (Arial,Courier, Gothic, Lucida, Times New Roman, various forms of handwriting,etc) comprise one type of style, and various sizings and spacings ofcharacters comprise other styles. With respect to graphics there arestyles here as well. Moving images, for example, can be styled as VCR orBeta video, or DVD. Similarly, still images can be styled as hard copyphotographs, TIP, GIF, and other computer files.

The sense modality of sound is also deemed herein to include severallanguages, including the various spoken and written human languages,various languages of music (including, for example, classical music,rock music, punk music, and jazz), animal sounds, industrial sounds,transportation sounds, and electronic sounds such as beeps. Still otherlanguages are contemplated as well, each of which may have severaldifferent styles. With the language of classical music, for example,some of the possible styles include baroque, modem, and so forth.

Technically, the sense modality of taste only includes four possiblesensations, sweet, sour, salty and bitter. In our lexicon these wouldcomprise the different languages of taste, with variations within eachsensation comprising the different styles.

In our lexicon, the sense modality of smell includes the “languages” offlorals, musks, foods, inorganics, etc.

In our lexicon, the sense modality of touch includes the “languages” ofvibration, pressure, temperature, movement, texture, etc.

As can now be appreciated, the terms “sense modality”, “language”, and“style” are each used herein in a very specific manner. Sense modalitiesare distinguished one from another by the sense organ(s) primarily usedto detect the information, while languages are different means ofexpression within a given sense modality. With a given sense modalityand language, styles refer to variations in expressing information thatcan be achieved without changing the language.

All of these are distinguishable from the “medium”, which is employedherein to mean the physical device upon which an item of informationresides. A photographic image, for example, may reside on a piece ofphotographic paper, in which case the medium is the paper. The sameimage may also reside on computer disk, in which the medium is the disk.The image can also be transmitted via modem, in which case the mediummay be a copper wire.

This is an important distinction because a change in medium does notnecessarily mean a change in sense modality or style. For example, whena person talks on a portable telephone, the relevant item of informationmay be a spoken sentence. The sense modality would be sound, and thelanguage may be that of English. The style may be very fast, slurredspeech. The telephone translates the sounds into an analog or digitallanguage for transmission through the medium of air, with the particularstyle depending upon the specific protocols of the service provider.Throughout the entire process, however, the sense modality is stillconsidered to be sound. because that is how a human being would perceivethe information once it was converted back into an analog form at afrequency that the human being could understand. Similarly, even thoughthe information may be interchanged between digital and analog, theinformation is still considered to maintain the same language and style.

There are many circumstances in which it is known to translateinformation between sense modalities, and between languages of the sameor different sense modalities. For example, the jazz can be translatedbetween written notes (sight modality, and possibly Western musictranscription as the language) and notes played on an instrument (soundmodality, with jazz as the language). Similarly, spoken English (soundmodality, English language) can be translated between spoken Germansound modality, German language). Humans are quite adept at performingsuch translations internally, and as discussed above, computers arebeginning to achieve a useful translation capability as well.

In all known instances of which the present inventor has knowledge,however, the information is never wirelessly transmitted to a distantcomputer for translation, translated at the distant computer .cat least20 kilometers away), wirelessly returned to the location from which itwas sent (“locally”, “local”, and “location” all being defined as withina radius of 100 meters), and then expressed locally to the source, allin substantially real time (less than three minutes from initialtransmission of the information to expression of the translatedinformation). Examples follow:

-   -   In laboratories that develop voice recognition software, it is        presumably known to utilize a central computer for development        work, and to access that computer using workstations wired into        the central computer. That situation does not, however, involve        wireless transmission, and the translating computer is not        distal.    -   A user loads voice recognition software on a desktop or laptop        computer, telephones the computer to record a message, and then        accesses that information from a distant computer. In that        situation the operation does not occur in substantially real        time. The user most likely records several minutes of speech        using his telephone, and then downloads a text file translated        from the speech using a laptop or other computer.    -   One person transmits an e-mail to a recipient, and the recipient        causes a computer to “read” the e-mail to him over the        telephone. In that situation the total duration between        transmitting of the e-mail and hearing it spoken is most likely        not less than 60 seconds, and the message is most likely not        heard locally to the place from which the e-mail was originally        sent.    -   A user employs a distal central computer for computational        purposes. The user enters the equation x=156×2, asks the        computer for the answer, and the computer immediately transmits        back the answer. That situation falls outside the present        invention because the distal computer evaluated the expression        rather than translate what was sent to it. If the computer had        returned the spoken words “x equals one hundred fifty six times        two”, then the computer would have returned a translation.    -   A user has a cell phone that is connected to a music web site on        the Internet. The user speaks the words “Beethoven's Fifth        Symphony”, and the web site transmits a portion of the symphony        over the phone. This situation also falls outside the present        invention because the distal computer evaluated the words rather        than translated them. If the computer had returned the text        “Beethoven's Fifth Symphony”, then the computer would have        returned a translation.    -   A user employs his cell phone to secure a dictionary definition.        He speaks a particular word, the cell phone transmits the spoken        word to a distal computer, and the distal computer returns the        definition. This situation also falls outside the scope of the        present invention because the distal computer evaluated the word        rather than translating it.    -   Voice recognition software is used to operate a cell phone.        There are two known possibilities here, neither of which fall        within the inventive concepts herein. The first possibility is        that the cell phone has some sort of primitive voice        recognition. The user says “call home”, and the telephone        transmits that speech to a distal computer. The distal computer        evaluates the number for “home”, and places the call. This        situation again falls outside of the present invention        because (1) the distal computer evaluated the word “home” rather        than translating it, and (2) the distal computer placed the call        (or caused it to be placed) rather than sending the telephone        number back to the cell phone.    -   A user types text into a terminal for transmission to a        translation website. The website computer translates the text        into another language, and returns the translation to the user.

These limitations are not merely design choices. Among other things, thepresent invention opens up an entire realm of possibilities notpreviously contemplated. Examples include:

-   -   A cell phone can be used as a dictation machine. Here, a user        talks into his cell phone, the cell phone transmits the        information back to a central mainframe that translates the        speech into text, and then transmits the text back to the user's        cell phone, PDA or other device for storage. When the user wants        to hear past speech, the device that stored the text either        reads back the text using local software, or transmits the text        (directly or indirectly) back to the central computer, which        then translates the text into speech, and then transmits the        speech for playing.    -   A cell phone has an output port that connects to various        household utilities and other devices. He plugs connector into        the output port of the cell phone, and a corresponding port in        one of the devices. He then talks to the device through the cell        phone, using a message such as “turn on at 7 pm and off at 9        pm”. The voice is transmitted to a distal computer, the computer        translates the message into whatever command language the device        uses, transmits the command language formatted message back to        the cell phone, which then transmits it off to the device.        Alternatively or additionally, the device may “talk” to the user        by going through the cell phone.    -   A cell phone can be used as a translator. A user speaks into a        cell phone in his native language, the cell phone transmits the        speech to a distal computer, the distal computer translates the        speech into a second language, returns the translated speech        back to the cell phone, which then repeats the speech in the        second language. A preferred embodiment may even use two cell        phones. There, the speaker speaks into his own cell phone, the        speech is transmitted to the distal computer, translated, and        returned to a local cell phone being held by a person that        speaks another language.    -   A cell phone can be used as an aid for deaf persons. In this        scenario a deaf person receives speech in his cell phone, the        speech is sent to a distal computer for translation into text,        and the text is returned to the cell phone or another device for        local display. Such devices could be of great benefit for a deaf        person watching television or a movie, attending a play, or        simply speaking with other people. The system could also be used        to help teach a deaf person to improve his vocalization.    -   A similar system could be used for blind people, where the cell        phone transmits an image rather than sounds, and receives speech        back from the distal computer instead of text. Sample sounds        received from the distal computer and played locally may        comprise simple, but very useful phrases such as “red light”,        “curb 20 feet away”, “supermarket”, and so forth. These would        simple be voice translations of images that the blind person        cannot see. A single, very sophisticated nationwide system could        be put in place and made available for millions of deaf or blind        individuals, requiring even each user to have only relatively        inexpensive equipment.    -   A cell phone can be used to store information in a computer.        Rather than purchase an inexpensive voice recognition software        package, a user hooks his cell phone to his desktop, laptop, or        hand-held computer. He speaks into the cell phone, the cell        phone transmits the speech to a distal computer that translates        the speech into text, and transmits the text back to the cell        phone. The computer downloads the text from the cell phone.    -   A cell phone could be used to operate a computer, or even the        cell phone itself. Here, the user speaks a command into the cell        phone, the cell phone transmits the speech to a distal computer,        the distal computer translates the speech into device commands,        and transmits the text back to the cell phone. If appropriate,        the computer downloads the commands from the cell phone, and        executes the commands. In a simple example, the user could        speak: the number “714-555-1212” into the cell phone, the cell        phone could transmit that speech to the distal computer, which        would translate the speech into the equivalent touch tone        pulses, and transmit those pulses back to the cell phone. Once        received, the cell phone would use those pulses to dial the        number.    -   A cell phone can be used to look up terms, A user speaks the        word “appendix” into his cell phone, the phone transmits the        spoken word to a distal computer, the distal computer translates        the word into a picture of an appendix, and then transmits the        picture back to the cell phone for display. If the cell phone        were coupled to a device that dispensed smells or tastes, a        similar procedure could be used to translate terms such as        “roast chicken” and “bitter” into the sense modalities of taste        and smell. The same could also be true of sounds, where the        users speaks the words “piano middle e” and the distal computer        returns a piano tone at middle c.

It should be recognized that while each of these examples recites a cellphone, other communication devices could be used as well. The mainrequirements are that the communication device be capable of receivingan item of information in at least one sense modality and language, andtransmitting that information wirelessly to a distant computer.

It should also be recognized that the distance between the device thatinitially transmits the information and the distal computer need not belimited to more than 20 kilometers. In other contemplated embodimentsthe distances could be limited to those greater than 1, 5, 10, 15, 25,50, 100 km. Also with respect to distance, the device that receives thetranslated information may be disposed at other distances from thedevice that transmits the information to the distal computer. Instead ofthe two devices being disposed within a radius of 100 meters, thedevices may less than 5, 10, 25, 50, 75, 250, 500, 1000 meters ap81i. Ina particularly preferred embodiment, the sending and receiving devicesare the same device.

It should be still further recognized that the total duration betweentransmitting of the information to the distal computer and receivingback the translation could be limited to times other than less than 3minutes. Other contemplated times include less than 5, 10,30, and 45seconds, and less than 11,2,4,5, and 110 minutes. It may also warrantclarifying that these times refer to a first in—first out basis for anitem of information. In preferred embodiments the device that sends theinformation to the distal computer begins transmitting within a fewseconds after it begins to receive the information, and the distalcomputer begins translating the translation within a few seconds afterthe beginning o the translation becomes available. If all goes well, thetranslation of the beginning of a sentence, and certainly of aparagraph, is being received before the sentence or paragraph has beencompletely transmitted to the distal computer. This is not to say thatthe receiving device necessarily utilizes the translation (bydisplaying, performing, re-transmitting, etc), immediately upon receipt.Where a single cell phone is used as a foreign language translator, forexample, the cell phone may wait until the user stops speaking for asecond or two before expressing the translation.

FIG. 1 depicts an exemplary method 100 of changing the sense modality ofan item of information according to the inventive subject matter, inwhich a communication device 110 in a first location 101 transmits aninformation in a first sense modality and language 112 to a computer 120located in a distal location 102. The computer executes a program (notshown) that translates the information into a second sense modality andlanguage different from the first sense modality and language 122. Thetranslated information, now in the second sense modality and language122, is then transmitted back to the first location 101 to acommunication device 111.

It is important to note that the translation does not necessarily meanthat both the sense modality and language are changed. Translating theinformation into a second sense modality and language different from thefirst sense modality and language means that either the sense modalityis changed, or the language is changed, or both. The item of informationis preferably speech, and more preferably a sentence of at least 5, 10,or 15 words. Other contemplated items of information include singlewords and short phrases, as well as what would comprise an entireparagraph is written. Still other contemplated items of informationinclude sounds. It is contemplated, for example, to receive a musicalperformance into a cell phone, have the cell phone transmit theperformed music to a distal computer, the distal computer translate theperformed music into sheet music, and then send the sheet music back tothe cell phone for display or storage.

In FIG. 2, a system 200 according to the present invention includes acommunication device a first communication device 210 in a firstlocation 211 that transmits information in a first sense modality 212 toa computer 220 in a distal location 221. The computer 220 receives theinformation in the first sense modality and executes a program thattranslates the first sense modality in the second sense modality (notshown). Transmitter 230 transmits the information in the second sensemodality 222 back to the first communication device 210, oralternatively to a second communication device 230 at the first location211.

The first communication device can be any suitable device, including acellular phone, a PC, or a PDA. Where the first communication device isa cellular phone, it is particularly contemplated that such phones mayhave transient or permanent data storage capabilities of at least 150 kbytes, more preferably at least 1 MByte, and more preferably at least 4MByte. There are various transient and permanent data storage elementsfor electronic devices known in the art e.g., for telephone numbers,addresses, and other related information), all of which are contemplatedfor use herein. Cellular telephones need not be restricted to aparticular communication standard, and exemplary suitable standardsinclude the TDMA, CDMA, GSM and PDC standards.

Where the communication device comprises a PC or PDA, it is especiallypreferred that the data transmission to and from the device comprisesbroadband transmission via wireless interface. However, in alternativeaspects of the inventive subject matter, data transmission may alsoinclude internal and external modems, or local networks that mayor maynot be in data communication with another network. However, manycommunication devices other than a cellular phone, a PC and a PDA arealso contemplated, and particularly contemplated alternative devicesinclude landline telephones, laptop and palmtop computers, and two-wayradios.

The wireless requirement means that what is being transmitted utilizes awireless means of transmission during at least part of its journey.Wireless includes segments of the journey carried by radio wave,microwave, sonic transmission and so forth, but does not includesegments carried by copper wires or fiber optics. Nevertheless, it ishighly preferred that the device transmitting the information to thedistal computer has a direct wireless transmission. In other words, thesignal leaves the device by a wireless transmission, even though thesignal may later take paths involving copper wires or optical carriers.It is also preferable that the device transmitting the information tothe distal computer receives the translation directly from wirelesssignals. There, the distal computer may send out the translation acrossa copper wire or optical carrier, but the signal being received by thedevice is wireless.

Since all permutations of translation are contemplated, there areliterally millions of possible permutations contemplated. This can bedemonstrated by considering a very narrow subset of only two of the fivesense modalities and a “command modality” (Sight, Sound, and Command),the 20 most common spoken languages, and the 20 most common devicelanguages (for PCs, cell phones, PDAs, VCRs and so on). Using that smallsubset it is calculated that there are 1560 translation permutations (40languages being translated into any of 39 other languages), and thiscalculation ignores most of the spoken and -written languages of theearth, as well as most of the command languages, the various languagesof music and art, and so forth.

While it is generally contemplated that information is translated fromone sense modality and language into a second sense modality andlanguage different from the first, it is also contemplated that thetranslation may also be into two or more sense modalities and languages.Thus, a person may speak to a crowd of people having differentnationalities, the speech may be sent via cell phone to a distalcomputer, and the distal computer may translates the speech into two ormore languages, which are then transmitted back to numerous cell phonesin the vicinity of the speaker. In some cases, as mentioned above, thelanguage may be returned as spoken words, and in other instances aswritten words or characters.

It should also be appreciates that the term “distal computer” includesboth single computers and networks. It is very likely, for example, thatthe methods and systems embodied herein will involve a load balancedserver farm. A telephone company or subsidiary may well operate theserver farm.

Thus, specific embodiments and applications of distal translationsmethods and systems have been disclosed. It should also be apparent tothose skilled in the art that many more modifications besides thosealready described are possible without departing from the inventiveconcepts herein. The inventive subject matter, therefore, is not to berestricted except in the spirit of the appended claims. Moreover, ininterpreting both the specification and the claims, all terms should beinterpreted in the broadest possible manner consistent with the context.In particular, the terms “comprises” and “comprising” should beinterpreted as referring to elements, components, or steps in anon-exclusive manner, indicating that the referenced elements,components, or steps may be present, or utilized, or combined with otherelements, components, or steps that are not expressly referenced.

We claim:
 1. A system comprising software operable in cooperation with amobile phone, within which is contained electronics configurable by thesoftware to: receive, by the mobile phone, human understandable speech;transmit, by the mobile phone, the speech to a service distal from themobile phone; receive, by the mobile phone, an address related to atarget device from the service, wherein the target device is distal fromthe service and the service uses the transmitted speech to return theaddress to the mobile phone and the service returns the address withoutcontacting the target device; contact, by the mobile phone, the targetdevice based at least in part on the received address; receive, by themobile phone, a translation of the speech from the service, wherein thetranslation of the speech has a different sense modality from that ofthe speech; and transmit, by the mobile phone, the received translationto the received address of the target device, wherein the addressincludes information to identify the target device.